01 - Hypotheses and Statistical Inference

University of San Francisco

Matt Meister

Outline

  • Hypotheses and Predictions
    • Importance of formulating clear hypotheses
    • Examples of clear hypotheses in market research
  • Tests of Hypotheses
    • A weighted coin?
    • Simulation
    • p-values

What is a hypothesis?

What is a hypothesis?

In speaking terms:

  • A hypothesis is a specific prediction about the world.
  • This prediction can be about the past or future
    • i.e., People bought product y for reason x
    • or People will buy product y if x is true
  • Examples?

What is a hypothesis?

In statistical terms:

  • A hypothesis is a specific prediction about how variables behave.
  • Either:
    • How one variable behaves (prediction)
    • How multiple variables relate to eachother (hypothesis)
  • Compared to chance

Why are clear hypotheses important?

Your hypotheses guide the rest of the research process. Ideally:

  • Hypotheses inform research design, including:
    • Data collection
      • What kind of data collection is not affected by our hypotheses?
        • Secondary, when the data already exist
    • Data manipulation/wrangling
      • What data we include, how we summarize it
    • Visualization
      • What we want to look for in the data
    • Statistical testing
      • Does the data tell us anything important?

Examples of unclear hypotheses?

  • People like shoes
    • No clear comparison, no defined scope/setting
  • People give shoes high ratings
    • Again, not a clear comparison. What is high?
    • More clear?
    • People rate shoes higher than 3/5 on average

Hypothesis Testing

  • Why do we make predictions?
    • Because we want to understand how the world works
    • As marketers, we want to understand what information is worth knowing about products, customers, markets, etc.
    • Because if we understand what is worth knowing, we can understand:
      • What data to collect
      • Who/where to target
      • What new offerings may be viable
      • etc.
  • So when we formulate hypotheses and run tests, think:
    • What information may be worth knowing? (Hypotheses)
    • Is that information worth knowing? (Tests)

Hypothesis Testing: The Role of “Chance”

  • All statistical tests are in some way comparing observed data to random chance.
  • Including simple examples:
    • Is this coin “fair”?
      • Flip the coin a number of times
      • Count heads and tails
      • Test if heads come up roughly 50% of the time
  • And more complicated ones
    • Do customers prefer product A or product B?
      • Collect many responses for both products
      • Summarize them in some way
      • Compare this result to a “random draw”

Tests of Hypotheses

Tests of Hypotheses

Let’s start with the simpler example of coin flips. Imagine you are a kid (again?)

  • You and your sibling are responsible for doing the dishes
  • Instead of just taking turns, your sibling suggests a gamble: Each night they will flip a coin.
  • HEADS, you do the dishes. TAILS, they do the dishes.
  • After three weeks, you’ve done the dishes 16 times and your sibling only 5 times.
  • You suspect the coin is… RIGGED (!)
  • But how can you be certain?
  • Can you be certain?

The Null Hypothesis

Your hypothesis: THE COIN IS RIGGED!!. Specifically:

  • HEADS is significantly more likely to come up than TAILS.

However, the thing you can actually test is the null hypothesis.

  • The null hypothesis is the complement of your hypothesis:
  • H0
    • The coin is equally likely to land on HEADS as it was on TAILS.
  • In most cases, including this one, the null hypothesis is that any differences you observed between outcomes were the result of chance.

Why is it that the null hypothesis, but not your actual hypothesis, is what we can test?

  • The null hypothesis provides a specific model for the state-of-the world.
    • That any given coin flip has a 50% chance of landing on HEADS and a 50% chance of landing on TAILS.
  • The actual hypothesis (that the coin is rigged) is more vague.
    • Is HEADS 60% likely? 70%? Do you even care?

Is the Data Consistent with the Null Hypothesis?

Assess whether the data we observed (16 HEADS out of 21 flips) is consistent with the null hypothesis.

  • Is this a result that would likely arise by chance alone?
    • If not, that would constitute evidence of foul play (and support of our actual hypothesis).
  • Who thinks this result constitutes evidence of foul play?
    • 16/21 = ~76% of flips

Not so fast!!

  • Imagine flipping a (truly fair) coin twice.
  • Four possible outcomes, all equally likely:
    • HEADS-HEADS
    • HEADS-TAILS
    • TAILS-HEADS
    • TAILS-TAILS
  • Would it be suspicious if I got HEADS-HEADS?

Is the Data Consistent with the Null Hypothesis?

Would it be suspicious if I got HEADS-HEADS?

  • Probably not, as this happens 1/4 times in expectation
  • We take this logic further with hypothesis testing:
    • Is it possible to flip a perfectly fair coin 21 times and get 16 HEADS?
      • Yes!
    • It’s even possible to flip 21 HEADS.
    • So we can never be certain–from data alone about the truth of our hypothesis.
    • Instead, we can quantify how likely any given outcome would be if the null hypothesis is true.

Quantifying likelihood under the null hypothesis

Testing the Null Hypothesis with Simulation

One approach is to use theoretical math.

  • In a way, I did this when I said that the likelihood of flipping two HEADS in two flips is 25%.

Another approach is to use simulation.

  • Simulate a lot of outcomes given the assumption of the null hypothesis
    • e.g., each toss is equally likely to be HEADS or TAILS)
  • We “flip” 21 fair coins many, many times
    • Each time counting the number of HEADS out of the 21 flips.
  • We call each set of 21 flips a simulated sample
    • We’ll generate a lot of simulated samples
  • Then, we assess the likelihood of any given result by examining how frequently that result occurred over the entirety of the simulated samples

Quantifying likelihood under the null hypothesis

  • With simulation
  • Let’s flip some coins!

Open up RStudio

Today’s pRogramming

We are going to dive into R.

We’ll just use it to do stuff.

There is extra R help on Canvas:

  • R Help Module
    • Intro to R.html & Plotting.html

Quantifying likelihood under the null hypothesis

Testing the Null Hypothesis with Simulation

Create a “coin”

aCoin <- c("HEADS", "TAILS") # create a "coin"
  • When R sees an assignment arrow (<-), it will evaluate the code right of the arrow, and save it as the name left of the arrow
  • The c() bit is a function
    • Anytime you see parentheses in R, we are calling a function
    • The name of the function is left of the (), the arguments are inside
    • Here we are saying “concatenate ‘HEADS’ and ‘TAILS’ into a vector”

TERMINOLOGY ALERT

  • Function:
    • A bit of R code that will perform some task we want to repeat
    • Anytime you see parentheses in R, we are calling a function
    • The name of the function is left of the (), the arguments are inside
    • Here we are saying “concatenate ‘HEADS’ and ‘TAILS’ into a vector”
  • Vector:
    • Like a column in Excel
    • A column of numbers/words/characters/etc

Quantifying likelihood under the null hypothesis

Use sample() to simulate flipping the coin.

sample( aCoin, 
        size=1, 
        replace=TRUE ) # "flip" it once
[1] "HEADS"
  • Run this a few times in R. It should be generating HEADS or TAILS with equal likelihood.

TERMINOLOGY ALERT

  • Arguments:
    • The things inside of the () in an R function
    • These tell the function what you want
    • Arguments often have defaults
      • Try running sample( aCoin, size=1)
      • The default is not to replace
      • When we simulate things, we replace so that the distribution of any one draw stays consistent
  • Comments (#)
    • These let you write text to yourself and others in your code
    • R stops reading a line when it hits one
    • Anything left of a comment runs
    • Anything right does not

Quantifying likelihood under the null hypothesis

Use sample() to simulate flipping the coin

Then modify the code a bit to generate and store 21 flips of the coin.

aCoin <- c("HEADS", "TAILS") # create a "coin"
twentyOneFlips <- sample( aCoin, size=21, replace=TRUE ) # "flip" it 21 times
print( twentyOneFlips ) # print the flips
 [1] "HEADS" "HEADS" "HEADS" "HEADS" "HEADS" "HEADS" "HEADS" "TAILS" "TAILS"
[10] "HEADS" "HEADS" "TAILS" "TAILS" "HEADS" "TAILS" "HEADS" "TAILS" "TAILS"
[19] "HEADS" "TAILS" "TAILS"

Pause

  • We’re seeing a lot of new functions today
  • It might seem like you should memorize them and their arguments
  • Don’t
  • There are too many. Instead, focus on the logic/grammar
    • First comes the name
    • Then the ( opens it all up
    • Then I give arguments with argument = what I want
    • Then I close it off with )
  • If you get stuck, Google things
    • I don’t recommend using chatGPT for these classes
    • It’s going to give you some crazy shit, and works when you know how to give a good prompt

Quantifying likelihood under the null hypothesis

Count how many heads you got

nHeads <- sum( twentyOneFlips == "HEADS" ) # count the number of HEADS out of 21
print( nHeads ) # print the number of heads
[1] 12

Quantifying likelihood under the null hypothesis

Testing the Null Hypothesis with Simulation

Now, do this a whole bunch of times.

  • Using a loop
    • In each iteration, the loop will create a single simulated sample:
# setting up the simulation
nSims <- 10000 # number of simulations
aCoin <- c("HEADS", "TAILS") # create a "coin"
nHeads <- rep(NA, times=nSims) # an empty vector to hold simulation results
# loop to conduct simulation
for (i in 1:nSims) {
twentyOneFlips <- sample( aCoin, size=21, replace=TRUE ) # flip 21 coins
nHeads[i] <- sum( twentyOneFlips == "HEADS" ) # count and store the number of HEADS
}

You may notice there is no “output” for this simulation (i.e., nothing was printed).

  • That was deliberate. And good!
  • It would be practically useless to print the results for each sample.
  • Instead, we store the result of interest from each sample in the nHeads vector.

Quantifying likelihood under the null hypothesis

Testing the Null Hypothesis with Simulation

We can look at that vector in a number of ways.

  • We can use a table:
table(nHeads)
nHeads
   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17 
   2    7   25   94  249  565  980 1354 1697 1730 1393  975  551  246   95   32 
  18 
   5 
  • We can use a histogram
hist(nHeads, breaks = 20)

Why would we use a table and histogram?

Testing Our Coin

Out of 10,000 simulated samples, I flipped exactly 16 HEADS in 95 (or 0.95%) of the samples.

  • If the null hypothesis was true, it is pretty unlikely
    • If the coin was flipping was fair
  • You may be tempted to take the percentage of simulated samples that had exactly 16 HEADS and call it a p-value
    • I have evidence the coin wasn’t fair (p = 0.01)!
  • Is this correct?
  • No!
    • A p-value represents the likelihood of observing a certain outcome or an outcome more extreme
sum( nHeads >= 16 ) # samples with 16 or more HEADS
[1] 132
sum( nHeads >= 16 )/nSims # ...as a proportion
[1] 0.0132

We could say something like this:

“The evidence suggests that your sibling was flipping a coin biased to land on HEADS (simulated one-tailed p = 0.013).”

Testing Our Coin

“The evidence suggests that your sibling was flipping a coin biased to land on HEADS (simulated one-tailed p = 0.013).”

What did I mean by “one-tailed”?

hist(nHeads, breaks = 20)

  • Is this reasonable?

In this case, a one-sided test is reasonable.

  • We have a directional prediction
    • The null hypothesis says the coin is fair.
    • Our hypothesis is that it is biased to land on HEADS.
  • That being said, in many research applications we often start with a directional hypothesis, but use a two-sided test.

Two-Sided Tests

Two-tailed tests consider extreme outcomes on both sides of the distribution.

  • It should be equally unlikely under the null hypthesis to observe 16 or more TAILS out of 21 flips.
    • (5 or fewer heads)
hist(nHeads, breaks = 20)
# count samples with 16 or more HEADS and 5 or fewer HEADS
sum( nHeads >= 16 ) + sum( nHeads <= 5)
[1] 260
(sum( nHeads >= 16 ) + sum( nHeads <= 5))/nSims # ...as a proportion
[1] 0.026

Your Sibling Probably Rigged the Coin Flips

A fair coin, flipped 21 times, only has a 2%chance of landing on HEADS 16 or more times.

  • But you still can’t be 100% certain
    • There is a chance you sibling just got lucky.

How could you be more certain?

  • Collect more data
    • Flip it another 100 times or so and tally the results
      • The bigger your sample size, the more power you have to detect systematic differences.
  • You could also look for other evidence
    • Has your brother been searching the internet for ways to rig coin flips?
  • But, there will always be some uncertainty. That’s just how it is.

“Rejecting” the Null Hypothesis

Traditionally (arbitrarily) researchers will often declare a result “significant” if p < .05 (for a two-tailed test)

  • When a result is less than p < .05, researchers will often say that they can “reject the null hypothesis”
  • Thus, there is tendency for people to consider results with p-values less than .05 as “true” and dismiss those with p-values greater than .05 as noise.
  • We should be more nuanced as researchers by recognizing that evidence comes in varying strengths and to treat each piece of evidence accordingly.
  • I don’t think the “accept”/“reject” language helps.
  • Instead, I prefer language that I think is more accurate:
    • Evidence is either consistent with or inconsistent with the null hypothesis

Questions on p-values?